-
Notifications
You must be signed in to change notification settings - Fork 392
feat: add support for writing _SUCCESS files in parquet operations #6090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Greptile OverviewGreptile SummaryAdded support for writing Key Changes:
Minor Issue:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant DataFrame
participant LogicalPlanBuilder
participant CommitWriteSink
participant IOClient
participant Storage
User->>DataFrame: write_parquet(root_dir, write_success_file=True)
DataFrame->>LogicalPlanBuilder: write_tabular(write_success_file=True)
LogicalPlanBuilder->>LogicalPlanBuilder: table_write() creates OutputFileInfo
Note over LogicalPlanBuilder: OutputFileInfo stores write_success_file flag
DataFrame->>DataFrame: collect() - executes plan
DataFrame->>CommitWriteSink: finalize(states)
alt Overwrite Mode
CommitWriteSink->>IOClient: glob files in root_dir
IOClient->>Storage: list existing files
Storage-->>IOClient: file list
IOClient-->>CommitWriteSink: existing files
CommitWriteSink->>Storage: delete old files
end
alt write_success_file is true
CommitWriteSink->>IOClient: get_source(root_uri)
IOClient-->>CommitWriteSink: source
CommitWriteSink->>Storage: put("/_SUCCESS", empty bytes)
alt Success
Storage-->>CommitWriteSink: ok
else Failure
CommitWriteSink->>CommitWriteSink: log::warn(error)
end
end
CommitWriteSink-->>DataFrame: written file paths
DataFrame-->>User: DataFrame with file paths
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
Additional Comments (1)
Prompt To Fix With AIThis is a comment left during a code review.
Path: daft/dataframe/dataframe.py
Line: 794:795
Comment:
Missing `write_success_file` parameter in the docstring. The parameter should be documented.
```suggestion
write_mode (str, optional): Operation mode of the write. `append` will add new data, `overwrite` will replace the contents of the root directory with new data. `overwrite-partitions` will replace only the contents in the partitions that are being written to. Defaults to "append".
write_success_file (bool, optional): Whether to write a `_SUCCESS` file upon successful completion. Defaults to False.
```
How can I resolve this? If you propose a fix, please make it concise. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6090 +/- ##
===========================================
- Coverage 72.91% 43.41% -29.51%
===========================================
Files 973 909 -64
Lines 126196 112753 -13443
===========================================
- Hits 92016 48948 -43068
- Misses 34180 63805 +29625
🚀 New features to boost your workflow:
|
Changes Made
df.write_parquet("/path/to/output", write_success_file=True)Related Issues
#4085